

# README


## Description

The file rfs_master.do contains the sequential list of do files for the project. The project should take all data stored in the data/ directory, clean the relevant data from family search, perform all necessary merges, and create the tables and figures from the paper.

The entire project will run from start to finish in Stata 17 and should be backwards compatible to Stata 15. All non-core Stata packages are stored in the libraries/ subdirectory. If the do files are run individually, it is important to add these libraries to the ado path. This can be done with the following:

```stata

local cdir = c(pwd)
adopath ++ "`cdir'/libraries/stata"

capture mkdir output
capture mkdir working

```


## Data
All necessary data files are included in the data/ directory, with the exception of "United\_States\_Freedmans\_Bank\_Records\_1865\-1874\_CID1417695.txt". This file contains the transcribed new depositor records from FamilySearch.org, and can be made available to researchers from them. A pseudo-dataset is provided in the data directory called "United\_States\_Freedmans\_Bank\_Records_1865\-1874\_CID1417695\_pseudo.txt" which retains the record id variables but randomizes other integer data and redacts named text fields. Importing this dataset will create a dataset whose data-types and values match those of the master data.


Data Files:

  * United\_States\_Freedmans\_Bank\_Records\_1865-1874\_CID1417695.txt (Obtainable from Family Search.org)
  * signatory\_record\_gap\_details.csv
  * family\_search\_valid\_dates.csv
  * branch\_data.dta
  * passbook\_activity\_merged.csv
  * dividend\_records.csv
  * interest\_payments\_valid\_balance.csv
  * geocoded\_locations.dta
  * residence\_tagged.dta (Link file matched to whether residence was in-town or not)
  * wbu\_tagged.dta (Link file matched to "Where Brought Up")
  * bp\_tagged.dta (Link file matched to "Birth Place")




## Algorithmically Cleaning Family Search data
The program clean\_family\_search.do takes the raw data file from Family Search containing all records from the new depositor ledgers and cleans the data. The transcriptions are generally high quality, but care must be taken to establish clean dated sequences. In particular, care must be taken to deal with records that may be out-of-order and assure that they are properly ordered where dates may be ambiguous. Part of this involves the execution of an algorithm in python (subseq.py) which finds the longest increasing subsequence in each roll. This is executed via the Python-Stata integration available in Stata17 onwards, but will also work executed by any stock Python 3 interpreter.



## Data Source Descriptions

The data for this project comes from three primary sources. The new account indeces, the dividend payment records, and the passbook records. The data is further supplemented by records gathered from congressional reports. Each of these data sources are described in broader detail at [https://freedmansbank.uga.edu](https://freedmansbank.uga.edu)





